4.5 Traveling salesman problem (TSP)

Problem:
A salesman hopes to travel around a number of towns to sell his goods. He will visit each town exactly once and finally return to the first town where he set off.  The distances between any two towns are already known. Note that the salesman’s total tour length is decided by the visit order of the towns. The goal is to find out one visit order that gives the shortest tour length.
According to properties of the pairwise distances, TSP problems are divided into various settings, such as the symmetric/asymmetric settings or the general/metric settings. We will discuss the relatively restricted symmetric metric setting here. Let dist(A,B) denote the distance of path from town A to town B. Symmetric means that dist(A,B) = dist(B,A) (i.e. there is no one-way-path-like situation) and metric means that dist(A,C) ≤ dist(A,B)+dist(B,C), for any towns A, B, C.

Approximate solution:
The collection of towns and paths between towns could be represented as a complete undirected graph. Distances between towns correspond to weights of edges in the graph.


[Figure 1] Represent the collection of towns and paths as a complete graph.

A visit sequence is a sequence of all nodes in the graph that represents one feasible solution for TSP. That is, the salesman would start at vertex , then moved to , then , …, at last the salesman moved back from to . Tour length could be calculated for each visit sequence. For example, in Figure 1, the tour length of visit sequence 1234 is 8 + 6 + 7 + 9 = 30.
Note that the optimal solution could be obtained by enumerating all n ! visit sequences (precisely it is (n-1)! / 2 in terms of symmetry). However the time complexity for enumeration is exponentially high and no nowadays computer could afford it even for slightly large n. In fact, the TSP problem is well-known for the difficulty to find out its optimal solution. Even the restricted symmetric metric version is also NP-hard, which implies that having an optimal solution in polynomial time is nearly impossible. Thus fast algorithms are designed to produce suboptimal yet fairly good solutions. Some of these algorithms’ performances could be guaranteed formally by the approximation ratio*:
An optimization algorithm is said to have approximation ratio ρ(n), if for any instance, the cost of the solution given by the algorithm is at most ρ(n) times that of the optimal solution. And the algorithm is also called a ρ(n)-approximate algorithm.
*Another often used measurement for TSP algorithms is domination number.

A 2-approximation algorithm for symmetric metric TSP:

In the following we will give a simple 2-approximate algorithm based on minimum spanning tree (MST).


[Figure 2] MST of a graph

Algorithm

Input: complete graph G
Output: visit sequence

  1. Build a minimum spanning tree T for graph G.
  2. DFS traverse tree T so that a visit sequence is generated.
  3. Return

Correctness:
To show that the algorithm is 2-approximate, it suffices to prove the following two lemmas:

Lemma 1. The tour length of optimal solution is at least the sum of weights of T’s edges.
Proof. Note that the optimal tour forms a circle C containing all vertices in graph G. Let be the sub-graph obtained by removing one edge from C. Then must be a spanning tree of G and has its edges’ sum at least that of T’s since T is the minimum spanning tree of G. Thus the tour length which is the sum of all edges’ weights in C is also at least that of T’s edges.

Lemma 2: The tour length of our algorithm’s solution is at most twice the sum of weights of T’s edges.
Proof. Consider the path formed by all the exploring and backtracking steps of DFS traversing T. The tour length of the path is exactly twice the sum of weights of T’s edges.


[Figure 3] Path formed by traversing T.

Note that our algorithm “shortcuts” the DFS traversing path by moving directly from current vertex to next newly explored vertex. Then due to the metric property, the tour length will not increase. So in total the tour length of our algorithm’s solution is at most twice the sum of weights of T’s edges.

Time Complexity:

Building a minimum spanning tree with Prim algorithm takes O(E + n log n) time where E is the number of edges and n is the number of vertices. Since in a complete graph E = n (n - 1) / 2, there is O(E + n log n) = O(n 2). The DFS traversing step will take O(n) time. So the overall time complexity is O(n 2).

© The University of Hong Kong Algorithms Library - hkual@cs.hku.hk